Search CORE

61 research outputs found

Direct Inter-Process Communication (dIPC): Repurposing the CODOMs architecture to accelerate IPC

Author: Etsion Yoav
Jordà Peroliu Marc
Navarro Nacho
Valero Cortés Mateo
Vilanova Lluis
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

In current architectures, page tables are the fundamental mechanism that allows contemporary OSs to isolate user processes, binding each thread to a specific page table. A thread cannot therefore directly call another process's function or access its data; instead, the OS kernel provides data communication primitives and mediates process synchronization through inter-process communication (IPC) channels, which impede system performance. Alternatively, the recently proposed CODOMs architecture provides memory protection across software modules. Threads can cross module protection boundaries inside the same process using simple procedure calls, while preserving memory isolation. We present dIPC (for "direct IPC"), an OS extension that repurposes and extends the CODOMs architecture to allow threads to cross process boundaries. It maps processes into a shared address space, and eliminates the OS kernel from the critical path of inter-process communication. dIPC is 64.12× faster than local remote procedure calls (RPCs), and 8.87× faster than IPC in the L4 microkernel. We show that applying dIPC to a multi-tier OLTP web server improves performance by up to 5.12× (2.13× on average), and reaches over 94% of the ideal system efficiency.We thank Diego Marr´on for helping with MariaDB, the anonymous reviewers for their feedback and, especially, Andrew Baumann for helping us improve the paper. This research was partially funded by HiPEAC through a collaboration grant for Lluís Vilanova (agreement number 687698 for the EU’s Horizon2020 research and innovation programme), the Israel Science Fundation (ISF grant 769/12) and the Israeli Ministry of Science, Technology and Space.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

System noise, OS clock ticks, and fine-grained parallel applications

Author: Dan Tsafrir
Dror G. Feitelson
Scott Kirkpatrick
Yoav Etsion
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2005
Field of study

As parallel jobs get bigger in size and finer in granularity, “system noise ” is increasingly becoming a problem. In fact, fine-grained jobs on clusters with thousands of SMP nodes run faster if a processor is intentionally left idle (per node), thus enabling a separation of “system noise ” from the com-putation. Paying a cost in average processing speed at a node for the sake of eliminating occasional processes delays is (unfortunately) beneficial, as such delays are enormously magnified when one late process holds up thousands of peers with which it synchronizes. We provide a probabilistic argument showing that, under certain conditions, the effect of such noise is linearly pro-portional to the size of the cluster (as is often empirically observed). We then identify a major source of noise to be indirect overhead of periodic OS clock interrupts (“ticks”), that are used by all general-purpose OSs as a means of main-taining control. This is shown for various grain sizes, plat-forms, tick frequencies, and OSs. To eliminate such noise, we suggest replacing ticks with an alternative mechanism we call “smart timers”. This turns out to also be in line with needs of desktop and mobile computing, increasing the chances of the suggested change to be accepted. 1

CiteSeerX

Crossref

Analysis of the Task Superscalar architecture hardware design

Author: Badia Sala Rosa Maria
Etsion Yoav
Jiménez González Daniel
Yazdanpanah Ahmadabadi Fahimeh
Álvarez Martínez Carlos
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

In this paper, we analyze the operational flow of two hardware implementations of the Task Superscalar architecture. The Task Superscalar is an experimental task based dataflow scheduler that dynamically detects inter-task data dependencies, identifies task-level parallelism, and executes tasks in the out-of-order manner. In this paper, we present a base implementation of the Task Superscalar architecture, as well as a new design with improved performance. We study the behavior of processing some dependent and non-dependent tasks with both base and improved hardware designs and present the simulation results compared with the results of the runtime implementation.This work is supported by the Ministry of Science and Technology of Spain and the European Union (FEDER funds) under contract TIN2007-60625, by the Generalitat de Catalunya (contract 2009-SGR-980), and by the European FP7 project TERAFLUX id. 249013, http://www.tera ux.eu. We would also like to thank the Xilinx University Program for its hardware and software donations.Postprint (author’s final draft

Elsevier - Publisher Connector

UPCommons. Portal del coneixement obert de la UPC

Author: Ben-Yehuda Muli
Etsion Yoav
Navarro Nacho
Valero Cortés Mateo
Vilanova Lluís
Publication venue
Publication date: 30/09/2003
Field of study

Today's complex software systems are neither secure nor reliable. The rudimentary software protection primitives provided by current hardware forces systems to run many distrusting software components (e.g., procedures, libraries, plugins, modules) in the same protection domain, or otherwise suffer degraded performance from address space switches. We present CODOMs (COde-centric memory DOMains), a novel architecture that can provide finer-grained isolation between software components with effectively zero run-time overhead, all at a fraction of the complexity of other approaches. An implementation of CODOMs in a cycle-accurate full-system x86 simulator demonstrates that with the right hardware support, finer-grained protection and run-time performance can peacefully coexist.We would like to thank Lluc Alvarez, Javier Cabezas, Ana Jokanovic, Marc Jorda, Carlos Villavieja, our shepherd Mohit Tiwari and the anonymous reviewers for their help and comments on this paper. This work has received funding from: the European Commission through TERAFLUX (FP7-249013) and RoMoL (GA-321253); the Spanish Government through Programa Severo Ochoa (SEV-2011-0067); the Spanish Ministry of Science and Technology through TIN2007-60625 and TIN2012-34557; the Israel Science Foundation (grant 769/12 and equipment grant 1719112); and the Ministry of Science and Technology, Israel. Yoav Etsion was supported by the Center for Computer Engineering at the Technion.Peer ReviewedPostprint (published version

Hiroshima Shudo University Academic Repository / 広島修道大学学術リポジトリ

Crossref

UPCommons. Portal del coneixement obert de la UPC

LEGaTO: first steps towards energy-efficient toolset for heterogeneous computing

Author: Alvarez Carlos
Bautista Leonardo
Becker Tobias
Billung-Meyer Gunnar
Carpenter Paul
Christmann Wolfgang
Cristal Adrian
De La Cruz Raul
Dubhashi Devdatt
Etsion Yoav
Felber Pascal
Fetzer Christof
Gaydadjiev Georgi
Göttel Christian
Hadar Elad
Hagemeyer Jens
Jimenez Daniel
Jungeblut Thorsten
Kaiser Martin
Klawonn Frank
Krupop Stefan
Kucza Nils
Madonar Sergi
Martorell Xavier
Mihklafi Amani
Mudge Trevor
Mudge Trevor
Pasin Marcelo
Pericàs Miquel
Pnevmatikatos Dionisios N.
Porrmann Mario
Port Oron
Rocha Isabelly
Salami Behzad
Salomonsson Hans
Schiavoni Valerio
Trancoso Pedro
Unsal Osman S.
vor dem Berge Micha
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

LEGaTO is a three-year EU H2020 project which started in December 2017. The LEGaTO project will leverage task-based programming models to provide a software ecosystem for Made-in-Europe heterogeneous hardware composed of CPUs, GPUs, FPGAs and dataflow engines. The aim is to attain one order of magnitude energy savings from the edge to the converged cloud/HPC.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Chalmers Research

Publications at Bielefeld University